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Sequential,  adaptive,  and  gradient  diffusion  filters  are  implemented  into  spatial  multiscale  three-dimensional  variational  data 
assimilation  (3DVAR)  as  alternative  schemes  to  model  background  error  covariance  matrix  for  the  commonly  used  correction 
scale  method,  recursive  filter  method,  and  sequential  3D  VAR.  The  gradient  diffusion  filter  (GDF)  is  verified  by  a  two-dimensional 
sea  surface  temperature  (SST)  assimilation  experiment.  Compared  to  the  existing  DF,  the  new  GDF  scheme  shows  a  superior 
performance  in  the  assimilation  experiment  due  to  its  success  in  extracting  the  spatial  multiscale  information.  The  GDF  can  retrieve 
successfully  the  longwave  information  over  the  whole  analysis  domain  and  the  shortwave  information  over  data- dense  regions. 
After  that,  a  perfect  twin  data  assimilation  experiment  framework  is  designed  to  study  the  effect  of  the  GDF  on  the  state  estimation 
based  on  an  intermediate  coupled  model.  In  this  framework,  the  assimilation  model  is  subject  to  “biased”  initial  fields  from  the 
“truth”  model.  While  the  GDF  reduces  the  model  bias  in  general,  it  can  enhance  the  accuracy  of  the  state  estimation  in  the  region 
that  the  observations  are  removed,  especially  in  the  South  Ocean.  In  addition,  the  higher  forecast  skill  can  be  obtained  through  the 
better  initial  state  fields  produced  by  the  GDF. 


1.  Introduction 


In  general,  standard  three-dimensional  variational  data 
assimilation  (3DVAR)  can  be  formulated  as  the  minimization 
of  the  following  cost  function  [1,  2]: 


7  (x)  =  -  (x  -  xj,)  B  (x  -  X(,) 


(1) 


It  is  a  challenge  to  determine  B  in  any  data  assimilation 
including  3DVar.  The  spatial  structure  and  the  magnitude 
of  the  correction  for  the  state  variables  being  estimated  are 
determined  completely  by  B. 

Two  common  approaches  are  used  to  prescribe  B.  The 
first  approach  is  the  correlation  scale  method  (GSM)  [3],  in 
which  B  is  represented  by  the  Gaussian  function: 

B  =  A(x,y)exp^-^-^y  (2) 


where  x:  is  the  analysis  vector,  is  the  background  vector,  y 
is  the  observation  vector,  Li  is  an  interpolation  operator  from 
model  space  to  observation  space,  R  is  the  observational  error 
covariance  matrix,  indicates  transpose,  and  indicates 
inversion.  B  is  the  background  error  covariance  matrix. 


where  A  is  an  estimate  of  the  magnitude  of  the  background 
error,  and  are  the  distances  between  two  grid  points, 
and  L^  and  are  the  characteristic  length  scales  reflecting 
the  extent  of  spatial  correction  of  the  background  error  in  the 
A  and  y  directions,  respectively.  A  more  general  anisotropic 
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shape  with  an  ellipsoid  about  the  spatial  covariance  can 
also  be  found  [4].  It  is  noted  that  B  is  explicitly  generated 
statistically  using  the  correlation  scales  [3,  5-7].  However, 
limitations  of  the  CSM  are  (1)  positive  value  for  each  element 
in  B  (which  is  not  always  true),  (2)  nonexistence  of  B~^  unless 
using  sufficiently  small  correction  scales,  and  (3)  requirement 
of  large  computer  memory  to  store  B  since  every  element 
in  B  is  calculated  explicitly.  To  avoid  the  inversion  of  B  and 
to  speed  up  the  convergence  of  descent  algorithms  such  as 
the  steepest  descent  and  conjugate -gradient  methods,  a  new 
vector  w  is  introduced  by  Lorenc  [8]  and  Berber  and  Rosati 
[3],  defined  as 

w  -  B~^  (x  -  Xij)  .  (3) 

Then  the  cost  function  /  can  now  be  rewritten  as 

7  (w)  =  i  (HBw  -  df  (HBw  -  d) ,  (4) 

where  d  -  y  -  Hxi^  is  the  “innovation”  vector,  and  for 
simplification,  hereafter,  we  will  call  it  observation. 

Considering  B^  =  B,  (4)  is  equal  to 

7  (w)  =  ^u/Bw  +  I  (HBw  -  df  RT^  (HBw  -  d) .  (5) 

The  effect  of  Bw  in  (5)  can  be  modeled  by  applying  an 
equivalent  spatial  filter  on  w. 

The  second  approach  to  prescribe  B  is  the  recursive  filter 
method  (RFM)  [9], 

+  (1  -  a)  X^-  for  f  =  1, . . . ,  n, 

(6) 

Zi  =  -h  (1  -  a)  Yi  for  /  =  n, . . . ,  1, 

where  X^-  is  the  initial  value  at  grid  point  f ,  Y^  is  the  value  after 
filtering  for  f  =  1  to  n,  Z  is  the  initial  value  after  one  pass  of 
the  filter  in  each  direction,  and  a  is  the  filter  coefficient,  which 
determines  the  extent  of  spreading  of  observational  informa¬ 
tion  over  the  analysis  domain.  Multipass  filter  can  be  built  up 
by  repeated  application  of  (6).  Multidimensional  filter  can  be 
constructed  by  applying  this  one-dimensional  filter  in  each 
direction.  It  can  be  shown  [10]  that  such  multidimensional 
filter,  when  applied  with  several  passes,  can  accurately  model 
isotropic  Gaussian  error  correlations.  The  implementation 
using  recursive  filter  to  model  B  has  been  widely  used  due 
to  its  relatively  computational  inexpensiveness  [10-13]. 

An  outstanding  issue  of  either  CSM  or  RFM  is  its 
inefficiency  in  capturing  the  spatial  multiscale  information 
by  observations.  A  difficulty  in  practice  is  how  to  properly 
choose  the  characteristic  length  scales  and  in  the 
CSM  or  the  filter  coefficient  a  in  the  RFM.  Observational 
studies  show  that  (L^,  Ly)  change  with  location,  depth, 
and  time  [14-16].  If  they  are  too  large,  the  analysis  is  too 
smooth  and  shortwave  information  is  lost.  If  they  are  too 
small,  the  analysis  lacks  coherent  structure  in  data  sparse 
regions  because  the  longwave  information  cannot  be  properly 
corrected.  Thus,  in  the  past  it  has  been  thought  that  the 
characteristic  length  scales  (L^,  L^)  in  the  CSM  or  the  filter 
coefficient  a  in  the  RFM  is  responsible  for  the  unsatisfactory 
analysis  in  the  3DVAR. 


To  avoid  empirically  or  statistically  setting  the  character¬ 
istic  length  scales  and  to  correctly  minimize  the  longwave 
and  shortwave  errors  in  turn,  a  sequential  3DVAR  (S3DVar) 
method  was  developed  [17]  to  assimilate  sea  surface  temper¬ 
ature  (SST)  in  a  global  ocean  model  [18].  The  S3DVar  method 
is  simply  composed  of  a  series  of  3DVars,  each  of  which  uses 
recursive  filters  with  different  filter  coefficients.  These  3DVars 
sweep  through  all  resolvable  scales  by  observational  networks 
from  longwaves  to  shortwaves.  In  addition,  a  multigrid 
data  assimilation  scheme  was  also  introduced  to  extract  the 
resolvable  information  from  longwave  to  shortwave  in  an 
observational  system  [19].  Recently,  a  sequential  variational 
approach  based  on  the  multigrid  data  assimilation  method 
was  proposed  to  accurately  retrieve  the  multiscale  informa¬ 
tion  from  available  observation  systems  [20]. 

Since  the  matrix  B  is  treated  as  the  Gaussian  type  in 
the  CSM  and  modeled  as  the  diffusion  process  (or  Gaussian 
filtering  process)  in  the  RFM,  the  spread  of  the  information 
from  the  analysis  point  to  the  entire  region  is  interpreted 
as  the  diffusion  phenomenon  [21].  The  diffusion  filter  (DF) 
was  developed  on  the  base  of  the  Gaussian  diffusion  process 
and  therefore  can  be  used  directly  to  model  B.  Several 
spatial  multiscale  variational  analysis  schemes,  based  on  the 
modification  to  the  standard  DF  scheme,  are  proposed  in  this 
study.  As  a  pilot  study,  one  of  the  spatial  multiscale  variational 
analysis  schemes,  the  gradient  DF  (CDF),  is  used  to  assimilate 
SST  observations  into  an  intermediate  coupled  model  within 
a  perfect  “twin”  experiment  framework. 

The  paper  is  organized  as  follows.  The  methodology  of 
the  standard  DF  scheme  is  described  in  Section  2.  Several 
spatial  multiscale  DF  schemes  are  presented  in  Section  3. 
In  Sections  4  and  5,  simple  observing/assimilation  system 
simulation  experiments  and  global  SST  simulation  with 
an  intermediate  coupled  atmosphere-ocean-land  model  are 
conducted  to  evaluate  one  of  the  new  DF  schemes,  that  is,  the 
gradient  DF  (CDF),  on  the  model  estimation  and  forecast. 
The  conclusions  are  summarized  in  Section  6. 

2.  Diffusion  Filter 

The  DF  is  in  fact  a  Gaussian  filter.  Given  the  following  initial 
value  problem  for  one-dimensional  diffusion  equation 

du  _  d^u 
dt  ^dx^' 

u-w{x), 

t  =  0, 

where  a  >  0  is  the  diffusion  coefficient,  assumed  to  be 
constant.  Its  solution  can  be  formulated  by  the  convolution 
of  w{x)  with  a  Gaussian  kernel  G(x,  t)\ 

u  (x,  t)  -u){x)  ^  G  (x,  t) ,  (8) 

where  (*)  indicates  convolution,  G(x,  t)  =  (1/  V^o')e“^  , 

(7  =  That  is,  w(x,  t)  is  equivalent  to  applying  a  Gaussian 
filter  on  initial  value  w{x).  The  second  moment  of  the  filter 
kernel  is  =  2at,  which  characterizes  the  intrinsic  spatial 
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scale.  And  is  only  determined  by  diffusion  coefficient  a 
when  “time”  duration  t  is  set  to  be  constant,  which  implies 
that  the  larger  the  value  of  a  is,  the  lower  the  frequency 
information  of  w{x)  would  be  acquired  by  w(x,  t). 

Generally,  in  a  two-dimensional  finite  domain,  the  diffu¬ 
sion  model  can  be  written  by 


du  d^u  ,  d^u 
—  =  a— -r  +  h— -r 
ot  ox^  oy^ 


{x,y)  G  n,  t  G  (0,S] , 


u  =  w{x,y)  {x,y)  G  n,  t  =  0, 


(9) 


(x,  y)  G  r. 


(1)  Choose  an  appropriate  diffusion  coefficient  a;  give  the 
initial  guess  of  u;  (u;  =  0,  for  instance). 

(2)  Integrate  the  diffusion  model  (9)  from  “time”  t  =  0  to 
S  to  obtain  u^(w). 

(3)  Calculate  /  according  to  (12). 

(4)  Integrate  the  adjoint  model  (11)  from  “time”  t  =  S  to 
0  to  obtain  R^{w);  then  the  gradient  g{w)  of  the  cost 
function  /  is  -R^{w). 

(5)  Use  descent  algorithms  to  adjust  w. 

(6)  Loop  from  step  (2)  until  the  convergence  criterion  is 
met. 


where  C  =  [0,  D]  x  [0,  H],  H  =  n\r  is  the  interior  domain  of 
n,  r  is  the  boundary  of  C,  n  is  the  outer  normal  direction  off, 
and  a  and  b  are  the  diffusion  coefficients  in  x  and  y  directions, 
respectively. 

If  u^{w)  denotes  the  cost  function  (5)  then 

becomes 


/  (w)  =  (^) 


+  -  (Hm^  (w)  -  df  [Hu^  (w)  -  d) . 


(10) 


Now  the  analysis  is  converted  to  the  problem  of  optimizing 
the  initial  value  of  the  diffusion  equation  (9).  To  do  so, 
we  need  the  gradient  of  the  cost  function,  which  can  be 
derived  by  using  adjoint  methods,  just  as  four-dimensional 
variational  (4DVAR)  data  assimilation  usually  does. 

For  convenience  of  illustration,  a  continuous  adjoint 
system  is  considered  and  4  is  omitted.  It  is  also  assumed 
that  the  observations  are  located  at  analysis  points  and  H  is 
the  identity  matrix.  Then  the  adjoint  of  the  tangential  linear 
model  of  (9)  takes  the  following  form: 


dR  d^R  ,d^R  ,  ,  ,  ^ 

- - {x,z)  e  Cl,  0  <  t  <  S, 

Ot  ox^  oy^ 

R  (x,  y,  S)  =  0,  (Xy  z)  G  n, 

dR 


where 


dn 


=  0  {xy  y)  G  r. 


f- 


dres  (U))  -  d-U^  (w)  t  -  Sy 

I  0  t  <S. 


(11) 


(12) 


Note  that  is  the  observation  residue,  which  character¬ 

izes  the  remaining  observational  signals  after  the  abstracted 
information  at  current  solution  Wy  u^{w)y  has  been  removed 
form  observations  d,  and  d^^^(w)  is  set  to  be  zero  at  the  grid 
points  with  no  observations. 

The  gradient  of  /  with  respect  to  w  is  g{w)  -  -R^{w)y 
where  R^{w)  is  the  initial  value  of  the  adjoint  variables.  Once 
the  adjoint  model  is  available,  the  analysis  can  be  performed 
in  the  following  steps. 


Use  of  DF  for  determining  the  matrix  B  is  called  the  DF 
method  (DFM),  which  has  the  same  computation  loads  as 
the  RFM  if  the  ADI  difference  scheme  (or  the  other  operator 
splitting  scheme;  see  Appendix)  is  applied  to  calculate  the 
diffusion  equation  (9).  The  diffusion  filter  scheme  has  the 
same  problem  as  the  recursive  filter  scheme  in  extracting 
observational  information.  As  the  extent  of  spatial  dispersion 
is  only  determined  by  diffusion  coefficient  a  when  “time” 
duration  t  is  set  to  be  constant,  if  a  is  large,  the  shortwave 
information  will  be  lost.  Conversely,  if  a  is  small,  the  long¬ 
wave  information  will  not  be  properly  captured.  Obviously, 
the  diffusion  coefficient  a  plays  the  same  role  as  the  filter 
coefficient  a  does  in  the  recursive  filter  scheme. 

3.  Spatial  Multiscale  Diffusion  Filters 

To  retrieve  longwave  information  over  the  whole  domain 
and  shortwave  information  over  data- dense  regions,  three 
spatial  multiscale  variational  analysis  schemes,  based  on  the 
diffusion  filter,  are  proposed. 

3.1.  Sequential  Diffusion  Filter  (SDF).  The  sequential  diffu¬ 
sion  filter  (SDF)  scheme  is  similar  to  the  S3DVar  method 
derived  by  Xie  et  al.  [17].  The  SDF  scheme  uses  a  sequence  of 
3DVars  to  obtain  the  final  estimation  to  retrieve  information 
from  all  wavelengths  from  long-  to  shortwaves  in  turn. 
The  matrix  B  is  modeled  by  applying  the  diffusion  filter 
sequentially  in  x  and  y  direction,  respectively.  SDF  begins 
its  sequence  with  a  big  value  of  the  diffusion  coefficient 
a;  then  an  initial  estimation  is  obtained  through  analyzing 
the  observed  data.  After  that,  a  S3DVar  is  solved  using  the 
diffusion  filter  with  a  smaller  a  than  before.  For  the  S3DVars, 
observations  to  be  assimilated  are  produced  by  subtracting 
the  previously  analyzed  values  from  the  observations  assim¬ 
ilated  by  the  previous  3DVar  until  the  diffusion  coefficient  a 
is  small  enough.  The  final  estimation  is  the  summation  of  all 
the  previous  3Dvar  analyses  based  on  the  diffusion  filter. 

From  the  above  description,  it  is  noted  that  the  SDF 
scheme  is  a  simple  extension  of  the  DF,  in  which  information 
is  retrieved  step  by  step  from  long-  to  shortwaves.  During  the 
process  of  the  SDF,  B  is  changed  gradually  with  the  different 
diffusion  coefficient  a  and  thus  becomes  flow  dependent 
and  anisotropic  following  the  multiscale  information  of  the 
observation. 
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3.2.  Adaptive  Diffusion  Filter  (ADF).  Due  to  the  introduction 
of  the  heat  diffusion  equation,  the  gradient  of  the  cost  func¬ 
tion  with  respect  to  the  state  variables  can  be  obtained  using 
the  adjoint  method  with  4DVar.  In  general,  the  diffusion 
coefficients  a(x,  y)  and  b(x,  y)  are  not  constants  but  are  space 
dependent.  Therefore,  it  is  possible  to  optimize  not  only  the 
state  variables  but  also  the  diffusion  coefficients  using  4DVar. 
State  variables  and  diffusion  coefficients  are  used  together  as 
control  variables,  so  values  of  a(x,  y)  and  h{x,  y)  will  change 
adaptively  according  to  the  distribution  of  observations. 

Set  4^,4^, 4  as 


i  e  [0,  /] ;  j  &  [0,  /] , 
^X,j  =  0- 
KKj  =  0. 

AyRlo  =  0, 

AyRl,  =  0, 

n  6  [0,iV  -  1] , 


4.= 


Xj  =  ihi  I  (■  =  0, l,...,4/^l  =  y 


4  = 


= 


yi  =  jhi  I  j  =  0,1,. ..,7,4  =  j 


4  =  MT  I  n  =  0, 1, . . . ,  N,  T  =  — 


(13) 


X 


A^R 


n+l/2 


A^R 


n+l/2 

Id 


A  nn+lll 


AyR 


n+l/2 

ij 


=  0, 

=  0, 
=  0, 

=  0, 


n  e  [0,]V-  1], 

(15) 


The  cost  function  is  transferred  to  the  following  form: 


7  (w,  a,  b) 

i-i  7-1 

2 


1  1-17-1 


i=l  j=l 


1  ^ 

^  m=l 


(14) 


Am) 


i=i  j=i 


’(m) 


where  M  is  the  number  of  observations  and  is  the 
interpolation  coefficient  of  the  grid  point  (f ,  j)  with  respect  to 
the  mth  observation,  p^^^  is  the  mth  element  of  the  diagonal 
matrix  R~^.  For  calculating  the  gradients  of  the  cost  function 
7  with  respect  to  u),a,b  in  (12),  the  discrete  adjoint  models 
of  (A.l)-(A.ll)  should  be  deduced  firstly  according  to  the 
Lagrange  multiplier  method. 
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t/2 
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+  dujAx  {RTA^  Kj))  = 


«  e  [0,N-  1];  f  6  [1,7-  1];  ;  6  [1,7-  1] , 


where 
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M 

1-17-1  1 

m=l 

i=l  7=1  J 

0 

n  =  N,  (16) 
n  <  N. 


The  gradients  of  the  cost  function  /  with  respect  to  u;,  a,  b  can 
be  expressed  as  follows: 


dj 

dW: 


t,J 


^  n=l 


(17) 


N-1 


1  X"*  /"I"  n  "7"  riti  ,  "7"  n+l~r  r)n+l/2\ 

~  2  ^  ^  ^y^i,j  ^y^Uj  )  • 

^  n=l 


The  process  for  the  state  estimation  with  the  4DVar  is  outlined 
as  follows,  (a)  Begin  with  the  initial  w,a,b.  (b)  Integrate 
the  model  equations  (A.l)-(A.ll)  forward  into  a  fixed  time 
window  and  calculate  the  value  of  the  cost  function  J(w,  a,  b) 
using  (14).  (c)  Integrate  the  adjoint  model  (15)  backward 
in  time  and  calculate  the  values  of  the  gradient  of  the 
cost  function  with  respect  to  the  control  variables  V /  using 
(17).  (d)  With  the  values  of  the  cost  function  J{w,a,b)  and 
the  gradient  V/,  use  the  Broyden-Fletcher-Goldfarb-Shanno 
(BFGS)  quasi-Newton  minimization  algorithm  to  obtain  the 
new  values  of  the  control  variables,  namely,  the  two  diffusion 
coefficients  a,  b  and  the  state  variables  w.  (e)  With  the  updated 
control  variables  from  process  (d),  repeat  processes  (b),  (c), 
and  (d)  until  the  convergence  criterion  for  the  minimization 
is  satisfied. 
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3.3.  Gradient  Diffusion  Filter  (GDF).  The  algorithm  is  a 
variant  of  the  spatial  multiscale  recursive  filter  [22] .  For  small 
diffusion  coefficients  a,  f?,  the  gradient  contains  not  only  all 
the  observational  signals  from  longer  to  shorter  wavelengths, 
but  also  a  lot  of  erroneous  signals  in  data  sparse  regions, 
which  causes  lack  of  coherent  longwave  structure  in  space. 
If  this  gradient  is  simply  introduced  into  the  minimization 
algorithm  without  careful  considerations,  the  analysis  departs 
far  from  reality.  Thus,  a  prerequisite  for  the  minimization 
algorithm  used  in  3DVAR  is  needed  to  extract  the  longwave 
information  from  the  gradient  and  at  the  same  time  to 
preserve  the  valuable  shortwave  signals. 

However,  the  longwave  information  implied  in  the  gra¬ 
dient  cannot  be  made  best  use  of  to  construct  a  reasonable 
descent  direction  in  general  minimization  algorithms.  Take 
the  steepest  descent  algorithm  as  an  example,  in  which  the 
descent  direction  is  simply  chosen  as  -g(w).  Suppose  the 
initial  guess  of  w  (i.e.,  Wq)  is  equal  to  zero.  Then  at  the  fth  iter¬ 
ation,  the  new  solution  Wi  =  Wi_i  +li_i  *  {-g{w^_i))  is  obtained 
by  using  a  line  search  algorithm  to  find  an  appropriate  step 
size  li_i.  According  to  what  have  been  indicated,  the  gradient 
giwo)  actually  represents  certain  scales  of  observations  d,  and 
these  scales  will  be  extracted  by  the  line  search  at  the  first 
iteration  and  incorporated  into  a  new  solution  .  However, 
if  the  diffusion  coefficients  a,  h  are  small,  the  gradient  giiv^) 
will  lack  coherent  structure  in  data  sparse  regions  though  it 
actually  carries  all  observational  signals.  And  since  the  new 
solution  Wi  is  simply  obtained  along  the  descent  direction, 
-giwo),  the  same  problem  will  also  exist  in  which 
indicates  that  the  longwave  information  of  observations  d 
is  not  effectively  extracted  from  the  gradient  giiv^)  at  the 
first  iteration.  Similarly,  at  the  second  iteration,  the  longwave 
information  of  the  observation  residue  after  the  first  iteration 
will  not  be  extracted  from  the  gradient  giiv^)  and  incorpo¬ 
rated  into  the  new  solution  W2,  and  so  on.  As  a  consequence, 
in  data  sparse  regions,  the  final  analysis  will  also  lose  the 
longwave  structure  of  observations.  The  same  problem  also 
exists  for  other  minimization  algorithms  such  as  BFGS  and 
the  conjugate  gradient  method,  for  the  same  reason. 

The  GDF  scheme  is  designed  to  effectively  retrieve  the 
longwave  information  over  the  whole  domain  and  shortwave 
information  over  data-dense  regions.  Since  the  gradient 
carries  all  observational  information,  the  main  idea  of  this 
new  scheme  is  to  apply  the  diffusion  filter  on  the  gradient  to 
extract  the  implied  longwave  signal.  While  the  diffusion  coef¬ 
ficient  decreases  continuously  with  iteration,  the  multiscale 
information,  from  long  to  short  wavelengths,  can  be  extracted 
successively.  The  algorithm  is  designed  as  follows: 

(1)  Give  an  initial  guess  oi  w  {i.e.,  Wq)  which  equals  zero. 
Then  select  diffusion  coefficients  a,h  as  small  ones 
and  give  a  large  enough  value  to  an  extra  diffusion 
coefficient  denoted  as 

(2)  Use  the  diffusion  filter  with  coefficient  a,  b  to  calculate 
Bw  in  (5). 

(3)  Calculate  the  difference  between  observations  d  and 
HBw,  namely,  the  observation  residue. 


(4)  Calculate  the  gradient  g  of  the  cost  function  /  with 
respect  to  w  using  the  DF  through  the  adjoint  model. 

(5)  Apply  the  diffusion  filter  with  coefficient  /3  on  -g 
to  calculate  the  descent  direction  E{-g),  where  E 
represents  a  positive  definite  operator. 

(6)  Select  E{-g)  as  the  descent  direction,  and  use  line 
search  algorithm  to  find  the  step  size,  /;  then  w  is 
adjusted  to  u;  =  u;  -1-  /  *  E{-g). 

(7)  The  value  of  (3  diminishes. 

(8)  Loop  from  step  (2)  until  the  convergence  criterion  is 
met. 

If  the  background  term  is  involved  in  the  cost  function 
7,  the  same  procedure  is  performed  except  that  g  calculated 
in  step  (4)  is  the  gradient  of  cost  function  /,  which  includes 
both  7^  and  J^. 

4.  Observing/ Assimilation  System 
Simulation  Experiments 

Observing/assimilation  system  simulation  experiments  are 
performed  to  evaluate  the  spatial  multiscale  variational  anal¬ 
ysis.  The  “truth”  field  in  these  experiments  is  represented  by 
an  analytic  temperature  field  defined  over  the  area  of  100°E- 
110°E  and  30°N-40°N.  The  “truth”  field  of  the  temperature 
is  plotted  in  Eigure  1(b),  whose  high  nonlinearity  can  be 
seen  from  Eigure  1(a).  The  grid  resolution  is  set  to  1/8°  x 
1/8°,  and  the  total  numbers  of  the  grid  are  80  x  80.  The 
observational  dataset  is  generated  using  the  analytic  solution. 
Observational  error  is  simulated  by  adding  a  sample  of  white 
noise  with  a  standard  deviation  of  0.2  to  the  “truth.”  Three 
experiments  are  conducted  in  which  different  configurations 
of  numbers  of  observations  are  employed. 

4.1.  Experiment  1.  In  this  experiment,  the  number  of  observa¬ 
tions  is  set  to  2000  at  first,  and  the  observations  are  randomly 
and  uniformly  distributed  in  the  whole  domain,  which  can 
be  seen  from  the  black  dots  in  Figure  1(b).  In  the  experiment 
with  DF,  several  values  of  the  diffusion  coefficient  are  used 
to  verify  the  impacts  on  the  analyzed  field.  In  the  experiment 
with  GDF,  the  processes  (l)-(8)  described  in  Section  3.3  are 
conducted.  The  diffusion  coefficients  a,  b  are  set  to  a  small 
value,  of  0.1,  which  suggests  almost  all  the  observational 
signals,  from  long  to  short  wavelengths,  can  be  retrieved. 
However,  a  large  enough  value,  1.0,  is  given  to  the  extra 
diffusion  coefficient  /3  of  the  gradient  at  the  first  step.  For  the 
subsequent  steps,  /3  is  reduced  by  0.1  from  the  previous  step. 
At  the  last  step,  (3  becomes  0.1,  which  is  small  enough  for  the 
case.  The  limited  memory  BFGS  quasi-Newton  minimization 
algorithm  [23]  is  used  during  the  minimizing  procedure. 

The  major  scales  of  the  truth  field  are  reconstructed  by 
2000  observations  almost  fully  using  the  GDF  (Figure  2(a)), 
but  not  well  reconstructed  using  the  DF  with  different  diffu¬ 
sion  coefficients  (a,  b):  1.0  (Figure  2(b)),  0.5  (Figure  2(c)),  and 
0.1  (Figure  2(d)).  The  small  scale  features  begin  to  dominate 
the  analyzed  fields  when  the  diffusion  coefficients  are  reduced 
gradually,  while  the  large  scale  signals  are  contaminated 
dramatically  by  an  abundance  of  small  scale  features. 
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Figure  1:  The  true  temperature  field  to  be  analyzed  (unit:  °C):  (a)  latitudinal  variation  along  100°E  and  (b)  ichnography  image.  Black  dots  in 
the  panel  (b)  show  the  distribution  of  2000  random  observations. 


As  He  et  al.  [18]  indicated,  artificial  signals  can  be 
produced  during  the  data  assimilation  if  the  chosen  diffusion 
coefficient  cannot  represent  the  actual  scale.  In  contrast, 
the  GDF  can  handle  spatial  multiscale  analysis  pretty  well 
compared  to  the  simple  DF  with  a  fixed  diffusion  coefficient. 
In  addition,  the  GDF  is  easy  to  avoid  in  empirical  selection  of 
the  diffusion  coefficient. 

Figure  3  shows  the  performance  of  GDF  and  DF  when 
the  number  of  observations  decreases  from  2000  to  500.  The 
GDF  (Figure  3(a))  can  retrieve  large  scale  information  from 
observations  and  leave  the  unresolved  scale  as  errors  on  top  of 
the  resolvable  scales.  These  errors  are  smaller  than  those  gen¬ 
erated  by  DF  with  a  fixed  diffusion  coefficient  (Figure  3(b)) 
in  the  condition  of  the  sparseness  of  the  observations  and  the 
lack  of  information. 

4.2.  Experiment  2.  The  second  experiment  is  conducted  with 
removal  of  observations  in  the  area  of  103°E~107°E  and  35°N~ 
40°N  (Figure  4)  to  further  evaluate  the  GDF  capability  in 
retrieving  the  multiscale  information  from  observations.  The 
analyzed  field  of  the  GDF  (Figure  5(a))  performs  much  better 
in  the  data  void  region  than  that  of  the  DF  with  (a,  h)  =  0.8 
(Figure  5(b))  and  0.5  (Figure  5(c)).  The  GDF  can  reconstruct 
the  temperature  field  (Figure  5(a))  reasonably  well  despite 
the  absence  of  the  observations  in  the  region  as  shown  in 
Figure  4.  The  spatial  pattern  of  the  whole  temperature  field 
can  be  captured  roughly  according  to  the  large  scale  informa¬ 
tion  derived  from  all  the  observations  in  the  whole  analyzed 
region.  However,  the  DF  fails  to  reconstruct  the  temperature 
field  and  produces  false  features  especially  in  the  data  void 
region.  For  example,  a  strong  cold  tongue  is  produced  for 
(a,h)  =  0.8  (Figure  5(b)),  and  large  scale  temperature  field  is 
distorted  with  displacement  of  the  thermal  front  in  the  data 
void  region  for  (a,  b)  =  0.5  (Figure  5(c)).  Little  information  of 


the  observations  can  be  extracted  from  data  rich  area  to  the 
data  void  region  using  DF. 

Such  capabilities  make  the  GDF  invaluable  to  get  well 
represented  values  for  the  data  void  (or  insufficiently  cov¬ 
ered)  areas  such  as  a  typhoon-affected  area  during  typhoon 
passage  or  the  Southern  Hemisphere  Oceans  (compared  to 
other  ocean  basins).  The  GDF  can  reconstruct  the  analyzed 
field  roughly  according  to  the  longwave  information  of  the 
observations  beyond  the  data  void  area  such  as  typhoon- 
affected  region  or  the  Southern  Ocean.  On  the  other  hand, 
both  the  standard  DF  and  the  traditional  RF  may  lead  to  false 
results  in  the  data  void  region,  as  shown  in  Figures  5(b)-5(c); 
an  improper  analysis  is  also  likely  to  be  produced,  which  will 
affect  the  analysis/forecast  accuracy  seriously. 

In  addition,  several  classical  geostatistical  tools,  such  as 
inverse  distance  to  a  power,  triangulation  with  linear  inter¬ 
polation,  and  Kriging  method  are  used  to  interpolate  such 
observations  (no  white  noise  is  imposed  on  the  observations). 
Compared  to  the  other  two  geostatistical  tools,  the  Kriging 
method  is  able  to  accurately  fill  in  the  hidden  information 
(Figures  6(a)  and  6(b)  versus  6(c)).  However,  compared  with 
the  variational  method,  the  geostatistical  tools  have  a  limited 
application  and  cannot  handle  corrections  between  different 
analysis  variables  or  physical  balances  and  other  constraints 
[20]. 

5.  Global  SST  Assimilation  Using  GDF 

In  this  section,  we  apply  the  GDF  to  assimilate  the  SST 
into  an  intermediate  climate  model  to  improve  the  climate 
representation  and  forecast. 

5.1.  Brief  Description  of  an  Intermediate  Atmosphere-Ocean- 
Land  Coupled  Model.  An  intermediate  atmo sphere -ocean- 
land  coupled  model  [24]  is  employed  as  the  first  step  to 
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Figure  2:  The  analyzed  temperature  fields  (unit:  °C)  using  (a)  GDF,  (b)  DF  with  a^b  =  1.0,  (c)  DF  with  a,  =  0.5,  and  (d)  DF  with  a,  =  0.1. 


examine  the  GDF.  Despite  limitations  in  the  representations 
of  some  basic  physical  processes  such  as  the  absence  of 
ENSO  dynamical  mechanism,  the  model  is  of  sufficient 
mathematical  complexity  for  the  purposes  of  this  study.  The 
intermediate  coupled  model  has  some  successful  applications 
in  coupled  data  assimilation  fields  recently.  For  example,  Wu 
et  al.  [25]  investigated  the  impact  of  the  geographic  depen¬ 
dence  of  observing  system  on  parameter  estimation,  and 
Zhang  et  al.  [26]  studied  parameter  optimization  when  the 
assimilation  model  contains  biased  physics  within  a  biased 
assimilation  experiment  framework.  The  configuration  of  the 
model  is  presented  here.  The  atmosphere  is  represented  by 


a  global  barotropic  spectral  model  based  on  the  potential 
vorticity  conservation: 

A  (T^  -  ni//)  ocean  •  surface 

(18) 

A  [Ti  -  fu//)  land  •  surface, 

where  q  -  /3y  -h  /3  =  df  jdy,  f  is  the  Coriolis 

parameter,  y  is  the  meridional  distance  from  the  equator 
(northward  positive),  and  y/  is  the  geostrophic  atmosphere 
stream  function,  is  a  scale  factor  which  converts  stream 
function  to  temperature.  A  is  the  flux  coefficient  from  the 
ocean  (land)  to  the  atmosphere.  and  denote  SST 


dq 


+  J{f,q) 
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Figure  3:  The  analyzed  temperature  fields  (unit:  °C)  with  500  observations:  analyzed  fields  using  (a)  GDF  and  (b)  DF  with  a,b  =  1.0. 


and  land  surface  temperature  (LST),  respectively.  Wu  et  al. 
[27]  used  the  nonlinear  atmospheric  model  to  develop  a 
compensatory  approach  of  the  fixed  localization  in  EnKF 
analysis  to  improve  short-term  weather  forecasts. 

The  ocean  is  composed  of  a  1.5 -layer  baroclinic  ocean 
with  a  slab  mixed  layer  [28]  as 


dt 


(19) 


=  -KrT„  +  ArVX  +  s  (t,  t)  +  C„  (r„  -  , 


where  0  is  the  oceanic  stream  function  and  Lq  =  0^1 
is  the  oceanic  deformation  radius,  with  g  and  Hq  being 
the  reduced  gravity  and  mean  thermocline  depth,  y  denotes 
momentum  coupling  coefficient  between  the  atmosphere  and 
ocean.  is  the  horizontal  diffusive  coefficient  of  0.  Kj^ 
and  Ay  are  the  damping  coefficient  and  horizontal  diffusive 
coefficient  of  =  Kj  x  k  x  f/g'  [29],  where  k  is  the 

ratio  of  up  welling  to  damping.  is  the  flux  coefficient  from 
the  atmosphere  to  the  ocean.  s(t,  t)  is  the  solar  forcing  which 
introduces  the  seasonal  cycle. 

The  evolution  of  land  surface  temperature  (LST)  is  given 
by 

m^r,  =  -KjT^  +  Ajy\  +  s  (T,  t)  +  Cl  {Ti  -  ^if) .  (20) 

where  m  represents  the  ratio  of  heat  capacity  between  the 
land  and  the  ocean  mixed  layer,  and  are  damping  and 
diffusive  coefficients  of  respectively,  and  Ci  denotes  the 
ffux  coefficient  from  the  atmosphere  to  the  land. 


Figure  4:  The  distribution  of  2000  random  observations,  but 
removal  of  the  observations  located  in  the  range  of  103°N~107°N  and 
35°E-^40°E. 


All  the  three  model  components  adopt  64  x  54  Gaussian 
grid  and  are  forwarded  by  a  leap  frog  time  stepping  with  a  half 
hour  integration  step  size.  There  are  2287  and  1169  grid  points 
over  the  ocean  and  land,  respectively.  An  Asselin-Robert  time 
filter  [30,  31]  is  introduced  to  damp  spurious  computational 
modes  in  the  leap  frog  time  integration.  Default  values  of  all 
parameters  are  listed  in  Table  1  in  Wu  et  al.  [24] . 

Starting  from  initial  conditions  ZO  =  (i//®,  0^,  T^,  T^^), 
where  i//®,  0®,  T^,  and  T®  are  zonal  mean  values  of  correspond¬ 
ing  climatological  fields,  the  coupled  model  is  run  for  60  years 
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to  generate  the  model  states  Z\  -  (i//\  T^^).  The  last  10 

years  model  states  (Zl)  are  used  as  the  “truth”  fields.  Figure  7 
shows  the  annual  mean  of  y/  (Figure  7(a)),  0  (Figure  7(b)), 
Tq  and  Ti  (Figure  7(c)),  where  the  associated  wave  trains  in 
the  y/  field  are  observed.  For  0,  one  can  see  the  distinct 
pattern  of  the  western  boundary  currents,  gyre  systems  and 
the  Antarctic  Circumpolar  Current  (ACC).  For  and  T^, 
reasonable  temperature  gradients  are  also  produced.  Note 
that  the  low  temperature  in  tropical  lands  can  be  attributed 
to  the  linear  damping  of  in  the  solar  forcing.  The  above 
model  configuration  is  called  the  “truth”  model,  which  has 
reasonable  but  rough  representation  for  the  basic  climate 
characteristics  of  the  atmosphere,  land,  and  ocean. 


5.2.  Model  “Bias”  Arising  from  the  Initial  States.  However, 
starting  also  from  the  same  initial  conditions  ZO,  the  Gaus¬ 
sian  random  numbers  are  added  to  y/^  and  (jf,  with  standard 
deviations  of  lO^m^s”^  (for  y/^)  and  10^  m^s“^  (for  0®), 
respectively.  The  coupled  model  is  also  run  for  60  years 
to  generate  the  model  states  Z2  =  (i//^,0^,T^,T^^).  The 

last  10  years  model  states  are  used  for  analysis.  This  model 
configuration  is  called  the  biased  model. 

The  model  “biases”  induced  by  perturbed  initial  fields  are 
examined.  Figure  8  shows  time  series  of  the  spatial  averaged 
root  mean  square  errors  (RMSEs)  of  i//,  0,  T^,  and  for  the 
assimilation  model,  which  are  calculated  according  to  the 
difference  in  the  assimilation  model  and  the  “truth”  model. 
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Figure  6:  Analyzed  temperature  results  (unit;  °C)  from  classical  geo  statistical  tools  such  as  (a)  inverse  distance  to  a  power,  (b)  triangulation 
with  linear  interpolation,  and  (c)  Kriging  method. 


The  obvious  difference  about  all  the  four  components  can 
be  seen  from  Figure  8.  The  RMSE  of  y/  reaches  about  1.6  x 
lO^m^s”^  with  a  high  frequent  oscillation  (see  Figure  8(a)). 
In  contrast,  the  RMSE  of  </>  performs  smoothly  and  rapidly 
decreases  within  the  first  year  and  gradually  reaches  a  low  and 
stable  value  about  10^  m^s“^  (see  Eigure  8(b)).  The  RMSEs  of 
Tq  (Eigure  8(c))  and  (Eigure  8(d))  increase  rapidly  in  the 
first  year,  which  are  generated  by  the  initially  perturbed  y/^ 
and  (j6^  through  the  coupling.  High  frequency  oscillation  is 
noted  in  the  time  series  of  the  RMSE  of  which  indicates 
that  the  land  surface  temperature  is  dominated  by  the 
atmospheric  motion  (i//).  However,  time  series  of  the  RMSE 
of  is  much  smoother  than  that  of  indicating  that  is 
modulated  by  the  oceanic  motion  (0). 


The  spatial  distribution  of  RMSE  of  for  the  biased 
model  (Eigure  9)  shows  a  notable  bias  in  the  ACC  region  of 
the  Southern  Ocean  with  a  maximum  value  over  15  K  near  the 
southern  tip  of  Africa.  Besides,  obvious  biases  also  exist  in  the 
west  boundaries  of  the  ocean  in  the  subtropical  regions. 

5.3.  Twin  Experiment  Design.  In  this  section,  with  the 
intermediate  model  and  DE/GDE  data  assimilation  scheme 
described  above,  a  perfect  twin  experiment  framework  is 
designed  with  the  assumption  that  the  errors  of  initial  model 
states  are  the  only  source  of  assimilation  model  biases. 
Starting  from  the  model  states,  Zl,  described  in  Section  5.1, 
the  “truth”  model  is  run  for  1  year  to  generate  the  time 
series  of  the  “truth”  states.  Only  synthetic  observations  of 
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Figure  7:  Annual  mean  of  (a)  atmospheric  stream  function  (unit:  m^s  ^),  (b)  oceanic  stream  function  (unit:  m^s  ^),  and  (c)  sea  surface 
temperature  and  land  surface  temperature  (unit:  K). 
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Figure  8:  Time  series  of  RMSEs  of  (a)  atmospheric  stream  function,  (b)  oceanic  stream  function,  (c)  sea  surface  temperature,  and  (d)  land 
surface  temperature  for  the  assimilation  model. 
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Figure  9:  The  spatial  distribution  of  RMSE  of  sea  surface  tempera¬ 
ture  for  the  biased  model. 


are  produced  through  sampling  the  “truth”  states  at  specific 
observational  frequencies.  A  Gaussian  white  noise  is  added 
for  simulating  observational  errors.  The  standard  deviations 
of  observational  errors  are  0.5° C  for  T^.  The  sampling  period 
is  24  hours.  The  “observation”  locations  of  are  global 
randomly  distributed  with  the  same  density  of  the  ocean 
model  grid  points. 

The  biased  model  uses  the  biased  initial  fields  depicted 
in  Section  5.2.  Starting  from  the  biased  model  states,  Z2, 
the  experiment  E_GDF  consists  on  assimilating  observations 
into  model  states  using  the  GDF  scheme.  In  comparison, 
the  experiment  E_DF  is  carried  out,  where  the  standard  DF 
scheme  is  used  with  the  diffusion  coefficients  a,  =  0.5.  In 
addition,  a  control  run  without  any  observational  constraint, 
called  CTRL,  serves  as  a  reference  for  the  evaluation  of 
assimilation  experiments. 

5.4.  Impact  of  the  GDF  on  the  Estimate  of  the  States.  The 
performance  of  GDF  is  investigated.  Figures  10 (a) -10(d) 
show  time  series  of  RMSEs  of  i//,  0,  T^,  and  for  the  CTRL 
(solid  line)  and  the  GDF  (dash  line).  Compared  to  the  CTRL 
(solid  line  in  Figure  10(c)),  of  the  GDF  has  significant 
improvement  (dash  line  in  Figure  10(c)),  in  which  the  RMSE 
decreases  to  approximately  0.5  K.  Figure  11  presents  the  spa¬ 
tial  distributions  of  RMSEs  of  using  GDF.  The  RMSE  of 
over  ocean  is  obviously  reduced  compared  with  that  of 
the  CTRL  (see  Figure  9),  especially  in  the  Southern  Ocean, 
the  subtropical  and  the  subpolar  regions.  In  particular,  the 
reduction  of  RMSE  is  much  significant  in  the  ACC  region, 
in  which  the  RMSE  decreases  from  above  15  K  to  below  3  K. 

Unlike  the  RMSEs  of  T^,  there  is  no  direct  observa¬ 
tions  constraint  for  i//,  0,  and  Tf,  therefore,  their  RMSEs 
decrease  gradually  owing  to  the  effect  of  the  coupling.  The 
RMSEs  of  y/  for  the  GDF  are  reduced  significantly  from 
about  1.6  X  10^  m^s“^  to  about  1.1  x  10^  m^s“^  with  a  high 
frequent  oscillation  (see  Figure  10(a)).  The  f  in  GDF  is  also 
improved  significantly  comparing  to  CTRL  (solid  versus  dash 
lines  in  Figure  10(b)),  whose  RMSE  decreases  gradually  and 
smoothly,  but  it  does  not  reach  a  stable  value  within  the 
experimental  period,  indicating  that  the  low  frequency  signal 
needs  a  much  longer  time  to  reach  equilibrium  compared 
to  the  high  frequency  signal.  For  T^,  the  GDF  reduces  the 
error  by  approximately  60%.  Note  that  has  no  direct  effect 
on  T^,  which  can  be  realized  according  to  the  framework 


of  the  coupling  model  (see  (18) -(20)).  Instead,  affects 
Ti  indirectly  via  i//.  The  improved  by  the  observational 
constraint  increases  the  quality  of  y/  over  land  through  the 
dynamical  constraint.  Then,  the  improved  y/  ameliorates 
through  the  process  of  the  external  forcing. 

5.5.  Removal  of  Observational  Data  in  the  Southern  Ocean.  In 
the  real  ocean,  the  observations  are  scarce  in  the  southern 
polar  region.  Therefore,  another  set  of  data  assimilation 
experiment  is  carried  out,  which  is  the  same  as  the  experi¬ 
ment  in  Section  5.3,  but  in  which  the  observations,  south  of 
50°S  and  50°E~300°E,  are  removed  completely. 

Figures  12(a)-12(d)  show  the  time  series  of  RMSEs  of  i//, 
0,  T^,  and  with  the  GDF  (black  line)  and  the  standard 
DF  with  a,b  -  0.5  (red  line).  The  RMSE  of  for  the  DF 
increases  persistently  during  the  experimental  period,  while 
the  RMSE  for  the  GDF  begins  to  descend  after  0.2  years  and 
converges  after  0.6  years  (Figure  12(c)).  When  the  diffusion 
coefficients  are  set  to  different  values  in  the  DF  experiment 
(e.g.,  a,  =  0.2, 0.8, 1.0),  similar  results  as  the  ones  presented 
in  Figure  12  are  obtained.  Results  indicate  that  DF  cannot 
correct  the  model  bias  in  the  data  void  region.  However, 
the  GDF  is  able  to  mitigate  the  model  bias  to  some  degree 
through  extracting  the  spatial  multiscale  information  from 
the  available  observations  to  the  data  void  region.  Figure  13 
presents  the  spatial  distributions  of  RMSEs  of  for  the  GDF 
and  the  standard  DF  with  a,b  -  0.5.  Compared  to  the  DF, 
the  GDF  produces  a  significant  improvement  within  the  data 
void  region  in  the  Southern  Ocean  (compare  Figures  13(a) 
and  13(b)). 

The  RMSE  of  y/  in  the  GDF  is  not  always  smaller  than  the 
DF  owing  to  the  strong  nonlinear  nature  of  the  high  frequent 
atmosphere  (red  line  versus  black  line  in  Figure  12(a)).  In 
contrast,  the  evolution  of  in  the  model  (see  (20))  is 
rather  simple  (i.e.,  linear);  the  RMSE  in  the  GDF  is  almost 
always  smaller  than  that  for  the  DF  (red  line  versus  black 
line  in  Figure  12(d)).  For  the  low  frequent  component  0 
(see  Figure  12(b)),  the  RMSEs  of  both  the  GDF  and  the  DF 
decrease  gradually,  indicating  that  the  effect  of  the  data  void 
region  on  the  low  frequent  signal  is  small  in  the  given  time 
scale. 

5. 6.  Impact  of  the  GDF  on  the  Forecast.  From  a  more  practical 
point  of  view,  the  role  of  the  GDF  should  be  judged  from 
the  model  forecast.  In  this  section,  two  forecast  experiments 
without  any  observational  constraint  are  integrated  for  1  year, 
respectively,  starting  from  the  final  analyzed  states  of  the 
above  two  assimilation  experiments  (the  GDF  and  the  DF). 

Figures  14(a)-14(d)  show  the  forecasted  time  series  of 
RMSEs  of  y/y  0,  T^,  and  for  the  GDF  (black  line)  and  the 
DF  with  a,  =  0.5  (red  line).  The  GDF  performs  much 
better  than  the  DF  in  1  years  forecast  lead  time  of  all  the  state 
variables  such  as  the  high  frequent  component  y/  and  the  low 
frequent  component  0  (black  versus  red  curves  in  Figures 
14(a)  and  14(b)).  It  is  interesting  that  the  forecasted  RMSE 
of  0  still  decreases  inertially  owing  to  the  longer  adjustment 
time  of  the  low  frequent  signal,  but  whose  trend  becomes 
mildly  with  the  increase  of  the  forecasted  lead  time.  For 
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Figure  10:  Time  series  of  RMSEs  of  (a)  atmospheric  stream  function,  (b)  oceanic  stream  function,  (c)  sea  surface  temperature,  and  (d)  land 
surface  temperature  for  CTRL  (solid  curve)  and  GDF  (dashed  curve). 


Figure  11:  The  spatial  distribution  of  RMSE  of  sea  surface  tempera¬ 
ture  using  GDF. 


because  of  the  absence  of  the  observational  constraint, 
the  forecasted  RMSE  has  an  obvious  positive  trend  (see 
Figure  14(c)),  indicating  that  the  forecasted  state  is  gradually 
drifting  away  from  the  truth.  Anyway,  the  GDF  retains  its 
superiority  relative  to  the  DF  during  the  entire  forecasted  lead 
time.  The  forecasted  RMSEs  of  have  similar  patterns  to 
those  of  (see  Figure  14(d)). 

6.  Conclusions  and  Discussions 

In  this  study,  the  diffusion  filter  (DF)  is  introduced  as  a 
concrete  implementation  of  the  3DVAR  scheme.  Similar  to 
the  recursive  filter  (RF),  the  outstanding  issue  of  DF  is  its 


inefficiency  in  capturing  the  spatial  multiscale  information 
resolved  by  observations.  Therefore,  several  spatial  multiscale 
variational  analysis  schemes  based  on  the  DF  are  proposed  to 
retrieve  the  spatial  multiscale  information  from  longwaves  to 
shortwaves.  As  one  of  the  spatial  multiscale  variational  anal¬ 
ysis  schemes,  the  gradient  diffusion  filter  (GDF)  scheme  is 
proposed  and  verified  through  a  set  of  observing/ assimilation 
system  simulation  experiments,  where  the  “truth”  field  of  the 
sea  surface  temperature  is  represented  by  a  high  nonlinear 
analytic  function  in  a  given  sea  region,  and  the  observations 
are  sampled  randomly  and  uniformly  in  the  whole  domain. 
Results  of  the  assimilation  experiments  indicate  that  the 
GDF  has  noticeable  advantages  over  the  standard  RF  and  DF 
schemes,  especially  in  the  data  void  region.  The  GDF  can 
retrieve  the  longwave  information  over  the  whole  domain 
and  the  shortwave  information  over  data-dense  regions. 

After  that,  a  perfect  twin  experiment  framework  is 
designed  to  study  the  effect  of  the  GDF  on  the  state  estimation 
based  on  an  intermediate  atmosphere-ocean-land  coupled 
model.  In  this  framework,  the  assimilation  model  is  subject 
to  “biased”  initial  fields  from  the  “truth”  model.  The  RMSE 
of  the  sea  surface  temperature  can  be  reduced  significantly 
through  the  observational  constraint  via  the  GDF.  At  the 
same  time,  the  RMSEs  of  the  other  model  components,  such 
as  the  land  surface  temperature  and  the  atmospheric  and 
oceanic  stream  functions  can  also  be  mitigated  by  the  dynam¬ 
ical  constraint  and  the  external  constraint  through  the  ocean- 
atmosphere-land  coupled  process.  For  simulating  the  real 
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Figure  12:  Time  series  of  RMSEs  of  (a)  atmospheric  stream  function,  (b)  oceanic  stream  function,  (c)  sea  surface  temperature,  and  (d)  land 
surface  temperature  for  GDF  (black  curve)  and  DF  with  a,b  =  0.5  (red  curve). 
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Figure  13:  The  spatial  distributions  of  RMSEs  of  SST  using  (a)  GDF  and  (b)  DF  with  a,b  =  0.5. 
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observational  networks  in  the  world  ocean  roughly,  the 
observations  locating  in  the  Southern  Ocean  are  removed 
to  investigate  the  role  of  the  GDF  in  retrieving  the  multi¬ 
scale  information  from  observations.  While  the  standard  DF 
hardly  removes  the  model  bias  in  the  data  void  region,  the 
GDF  may  mitigate  the  model  bias  to  some  degree  through 
extracting  the  multiscale  information  from  the  observations 
beyond  the  data  void  region.  In  addition,  the  higher  forecast 
skill  can  also  be  obtained  through  the  better  initial  state  fields 
produced  by  the  GDF. 

It  should  be  noted  that  the  background  term  is  omitted 
in  the  above  assimilation  experiments.  When  high-density, 
accurate,  resolvable  information  is  available  in  observa¬ 
tional  datasets,  it  is  much  essential  to  extract  the  multiscale 
information  from  the  observations  with  deterministic  data 
assimilation  approaches,  as  this  study  does.  High-quality 


background  fields  can  be  obtained  firstly  when  determinis¬ 
tic  data  assimilation  approaches  are  carried  out.  Next,  the 
statistical  data  assimilation  approaches,  such  as  traditional 
3DVar  and  4Dvar,  can  be  used  to  treat  observations  as 
random  variables,  in  which  4  will  be  included  to  extract 
the  information  that  cannot  be  resolved  by  the  observation 
networks. 

In  spite  of  the  promising  results  produced  by  the  GDF 
in  the  intermediate  climate  model,  much  work  is  needed 
to  explore  the  impact  of  the  multiscale  variational  analysis 
schemes  on  the  state  estimation  and  forecast  in  real  applica¬ 
tions  using  general  circulation  models  (GCMs).  In  addition, 
other  spatial  multiscale  variational  analysis  schemes  based  on 
the  DF,  such  as  the  adaptive  diffusion  filter  (ADF)  scheme, 
should  also  be  studied  to  further  improve  the  convergence 
speed  and  accuracy. 
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Figure  14:  Forecasted  time  series  of  RMSEs  of  (a)  atmospheric  stream  function,  (b)  oceanic  stream  function,  (c)  sea  surface  temperature,  and 
(d)  land  surface  temperature  for  GDF  (black  curve)  and  DF  with  a,b  =  0.5  (red  curve). 


Appendix 

Equivalence  between  the  RF  and  DF  Methods 

Using  ADI  scheme,  (9)  can  be  discretized  as  follows: 


A  n+l/l  r\ 

(A.8) 

~r  n+l/2  fx 

A^Ujj'  =0, 

(A.9) 

A  n+l/l  ^ 

(A.10) 

^  2  X  \  t,J  X  t,J  ) 

n  €  [0,N-l] ;  i  €  [1,7-1] ;  j  e  [1,7-1], 


(A.1) 


t/2 


(A.2) 


Ayulf^^  =0  n  e  [0,  N  -  1] ,  (A.ll) 

where  A^,  A^,  A^,  are  forward  and  backward  difference 
operator  in  x  and  y  direction,  respectively,  r  is  the  time  step, 
iyj  and  J,  /  are  grid  index  and  grid  numbers  in  x  and  y 
direction,  respectively,  and  n  and  N  are  the  time  index  and 
the  total  time  step  numbers.  The  common  tridiagonal  matrix 
algorithm  (TDMA)  can  be  used  to  solve  both  (A.l)  and  (A.2). 
For  example,  the  tridiagonal  equation  (A.2)  in  the  jth  row  can 
be  written  as  follows: 


Au  =  y, 


(A.12) 


nG  [0,N-1];  f  G  [1,1-1];  jG  [1,/-1], 

where 


ulj  =  Wjj  i  6  [0, 7] ;  ;■  e  [0,  /] , 
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Specially,  if  ^  is  a  constant,  which  is  equivalent  to  an 
isotropic  filter,  we  know  that 


=  fe  [1,7-2], 

qi  =  /  G  [1,7-2], 

p  +  q=l. 

Set  a  =  -qipy  then  (18)  and  (19)  can  be  formulated  as 


(A.17) 


li  =  - 

<Pi+i  =  <^<Pi  +  (1  -  a)  yi+i. 

f  =  1,2,  ...,7-2, 

(A.18) 

Ui  -  -h  (1  -  a)  (pi. 

/  =  7-2,7-l,...,l. 

1  f  6  [2,1-2] 

rUi  -  - 
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Equation  (A.18)  has  the  same 

form  as  (6)  in  the  RFM. 
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It  is  easily  testified  that  A  is  a  positive  definite  and  symmetri¬ 
cal  matrix.  Therefore,  the  Cholesky  decomposition  of  A  can 
be  processed  as  follows: 
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